智能机器之间合作的必要性已在人工智能(AI)研究界普及了合作的多代理增强学习(MARL)。但是,许多研究的努力一直集中在开发实用的MARL算法上,其有效性仅在经验上进行了研究,从而缺乏理论保证。正如最近的研究所表明的那样,MARL方法通常达到奖励单调性或收敛性次优的性能。为了解决这些问题,在本文中,我们介绍了一个名为异质的镜像学习(HAML)的新颖框架,该框架为MARL算法设计提供了一个通用模板。我们证明,源自HAML模板的算法满足了关节奖励的单调改善的所需特性以及与NASH平衡的收敛性。我们通过证明当前最新的合作社Marl算法,HATRPO和HAPKO实际上是HAML实例,来验证HAML的实用性。接下来,作为我们理论的自然结果,我们提出了两种众所周知的RL算法HAA2C(用于A2C)和HADDPG(用于DDPG)的HAML扩展,并证明了它们针对StarcraftII和多代理Mujoco任务的强大基准的有效性。
translated by 谷歌翻译
GPT系列和BERT等大型序列模型(SM)在视觉,语言以及最近的强化学习任务上表现出了出色的性能和概括功能。一个自然的后续问题是如何将多代理决策抽象成SM问题,并受益于SMS的繁荣发展。在本文中,我们介绍了一种名为多代理变压器(MAT)的新型架构,该结构有效地将合作的多代理增强学习(MARL)施加到SM问题中,其中任务是将代理的观察顺序映射到代理的最佳动作序列中。我们的目标是在Marl和SMS之间建造桥梁,以便为MARL释放现代序列模型的建模能力。我们垫子的核心是一个编码器架构,它利用多代理优势分解定理将联合策略搜索问题转换为顺序决策过程。这仅适用于多代理问题的线性时间复杂性,最重要的是,具有单调性能改进保证。与以前的艺术(例如Decorment Transformer Fit仅预先收集的离线数据)不同,MAT通过在线试验和环境中的错误进行培训。为了验证MAT,我们对StarcraftII,多代理Mujoco,灵巧的手操纵和Google Research Football Benchmarks进行了广泛的实验。结果表明,与Mappo和Happo在内的强大基线相比,MAT可实现卓越的性能和数据效率。此外,我们证明MAT是一位出色的少数人,无论代理人的数量变化如何,MAT都是看不见的任务。请参阅我们的项目页面,网址为https://sites.google.com/view/multi-agent-transformer。
translated by 谷歌翻译
一般政策改进(GPI)和信任区域学习(TRL)是当代强化学习(RL)内的主要框架,其用作解决马尔可夫决策过程(MDP)的核心模型。不幸的是,在他们的数学形式中,它们对修改敏感,因此,实现它们的实际实例化不会自动继承其改进保证。结果,可用严格的MDP-溶剂的光谱窄。实际上,许多最先进的(SOTA)算法,例如TRPO和PPO,不能被证明收敛。在本文中,我们提出了\ Textsl {镜像学习} - 对RL问题的一般解决方案。我们揭示了GPI和TRL,但在这个算法的近似空间内的小点,拥有单调改善性,并收敛到最佳政策。我们表明,RL的几乎所有SOTA算法都是镜像学习的实例,因此表明其实证性能是其理论属性,而不是近似类比的结果。令人兴奋的是,我们表明镜像学习与收敛保证的策略学习方法开辟了全新的全新空间。
translated by 谷歌翻译
We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster. We show that our definition is closely related to popular clustering measures, namely normalized associations, which is a dual of the normalized cut objective, and normalized modularity. We give a linear time constant-approximate algorithm for our objective, which implies the first constant-factor approximation algorithms for normalized modularity and normalized associations.
translated by 谷歌翻译
We study the problem of combining neural networks with symbolic reasoning. Recently introduced frameworks for Probabilistic Neurosymbolic Learning (PNL), such as DeepProbLog, perform exponential-time exact inference, limiting the scalability of PNL solutions. We introduce Approximate Neurosymbolic Inference (A-NeSI): a new framework for PNL that uses neural networks for scalable approximate inference. A-NeSI 1) performs approximate inference in polynomial time without changing the semantics of probabilistic logics; 2) is trained using data generated by the background knowledge; 3) can generate symbolic explanations of predictions; and 4) can guarantee the satisfaction of logical constraints at test time, which is vital in safety-critical applications. Our experiments show that A-NeSI is the first end-to-end method to scale the Multi-digit MNISTAdd benchmark to sums of 15 MNIST digits, up from 4 in competing systems. Finally, our experiments show that A-NeSI achieves explainability and safety without a penalty in performance.
translated by 谷歌翻译
This paper presents a conversational AI platform called Flowstorm. Flowstorm is an open-source SaaS project suitable for creating, running, and analyzing conversational applications. Thanks to the fast and fully automated build process, the dialogues created within the platform can be executed in seconds. Furthermore, we propose a novel dialogue architecture that uses a combination of tree structures with generative models. The tree structures are also used for training NLU models suitable for specific dialogue scenarios. However, the generative models are globally used across applications and extend the functionality of the dialogue trees. Moreover, the platform functionality benefits from out-of-the-box components, such as the one responsible for extracting data from utterances or working with crawled data. Additionally, it can be extended using a custom code directly in the platform. One of the essential features of the platform is the possibility to reuse the created assets across applications. There is a library of prepared assets where each developer can contribute. All of the features are available through a user-friendly visual editor.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on PaLM-540B generated chains of thought.
translated by 谷歌翻译
In the field of derivative-free optimization, both of its main branches, the deterministic and nature-inspired techniques, experienced in recent years substantial advancement. In this paper, we provide an extensive computational comparison of selected methods from each of these branches. The chosen representatives were either standard and well-utilized methods, or the best-performing methods from recent numerical comparisons. The computational comparison was performed on five different benchmark sets and the results were analyzed in terms of performance, time complexity, and convergence properties of the selected methods. The results showed that, when dealing with situations where the objective function evaluations are relatively cheap, the nature-inspired methods have a significantly better performance than their deterministic counterparts. However, in situations when the function evaluations are costly or otherwise prohibited, the deterministic methods might provide more consistent and overall better results.
translated by 谷歌翻译
Deep learning surrogate models are being increasingly used in accelerating scientific simulations as a replacement for costly conventional numerical techniques. However, their use remains a significant challenge when dealing with real-world complex examples. In this work, we demonstrate three types of neural network architectures for efficient learning of highly non-linear deformations of solid bodies. The first two architectures are based on the recently proposed CNN U-NET and MAgNET (graph U-NET) frameworks which have shown promising performance for learning on mesh-based data. The third architecture is Perceiver IO, a very recent architecture that belongs to the family of attention-based neural networks--a class that has revolutionised diverse engineering fields and is still unexplored in computational mechanics. We study and compare the performance of all three networks on two benchmark examples, and show their capabilities to accurately predict the non-linear mechanical responses of soft bodies.
translated by 谷歌翻译